Teradata: Fast Load

Teradata FastLoad is a command-driven utility which can be used to quickly load large amounts of data in an empty table on a Teradata Database. Part of this speed is achieved because it does not use the Transient Journal. But, regardless of the reasons that it is fast, know that FastLoad was developed to load millions of rows into empty Teradata tables. FastLoad loads data into EMPTY Teradata tables in 64 K Blocks. The only command FastLoad understands is INSERT!

Data can be loaded from

Disk or tape files on a mainframe-attached client system
Input files on a network-attached workstation
Special input module (INMOD) routines written to select, validate, and preprocess input data
Any other device providing properly formatted source data

Teradata FastLoad uses multiple sessions to load data. However, it loads data into only one table on a Teradata Database per job. To load data into more than one table in the Teradata Database, multiple Teradata FastLoad jobs must be submitted, one for each table.

Fast load Restriction

Only Primary Index is allowed in Fast Load.
No Secondary Indexes are allowed on the Target Table.
No Referential Integrity is allowed. FastLoad cannot load data into tables that are defined with Referential Integrity (RI). This would require too much system checking to prevent referential constraints to a different table. In short, RI constraints will need to be droped from the target table prior to the use of FastLoad.
No Triggers are allowed at load time. FastLoad is much too focused on speed to pay attention to the needs of other tables, which is what Triggers are all about. Simply ALTER the Triggers to the DISABLED status prior to using FastLoad.
No AMPs may go down (i.e., go offline) while FastLoad is processing. The down AMP must be repaired before the load process can be restarted.
No more than one data type conversion is allowed per column during a FastLoad. Why just one? Data type conversion is highly resource intensive job on the system, which requires a "search and replace" effort. And that takes more time.
Loads data in 64K blocks and only 30 FastLoad and MultiLoad combinations can run simultaneously.

Teradata FastLoad supports the following data formats:

Formatted
Unformatted
Binary
Text
Variable-length text

Two Phases of Fast Load

Acquisition Phase
Application Phase

The job of Phase 1 is to get the data off the mainframe or server and move it over the network inside Teradata. The data moves in 64 K blocks and is stored in worktables on the AMPs.

The data rows are not on the correct AMP yet! When all of the data has been moved from the server or mainframe flat file then each AMP will hash its worktable rows so each row transfers to the worktables on the proper destination AMP.

Acquisition Phase

The primary function of Phase one is to transfer data from the host computer to the Access Module Processors (AMPs) as quickly as possible. For the sake of speed, the Parsing Engine of Teradata does not take the time to hash each row of data based on the Primary Index. That will be done later. Instead, it does the following:

When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse the SQL just once. It then opens a Teradata session from the FastLoad client directly to the AMPs. By default, one session is created for each AMP.
Therefore, on large systems, it is normally a good idea to limit the number of sessions using the SESSIONS command. Simultaneously, all but one of the client sessions begins loading raw data in 64K blocks for transfer to an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as possible. To accomplish this, the rows are packed, unhashed, into large blocks and sent to the AMPs without any concern for which AMP gets the block.
The result is that data rows arrive on different AMPs than those they would live, had they been hashed. So how do the rows get to the correct AMPs where they will permanently reside? Following the receipt of every data block,each AMP hashes its rows based on the Primary Index, and redistributes them to the proper AMP. At this point, the rows are written to a worktable on the AMP but remain unsorted until Phase 1 is complete.

Phase 1 can be compared loosely to the preferred method of transfer used in the parcel shipping industry today. How do the key players in this industry handle a parcel? When the shipping company receives a parcel, that parcel is not immediately sent to its final destination. Instead, for the sake of speed, it is often sent to a shipping hub in a seemingly unrelated city. Then, from that hub it is sent to the destination city.

Application Phase

The shipping vendor must do more than get a parcel to the destination city. Once the packages arrive at the destination city, they must then be sorted by street and zip code, placed onto local trucks and be driven to their final, local destinations. Similarly, FastLoad’s Phase 2 is mission critical for getting every row of data to its final address (i.e., where it will be stored on disk). In this phase, each AMP sorts the rows in its worktable. Then it writes the rows into the table space on disks where they will permanently reside.

Rows of a table are stored on the disks in data blocks. The AMP uses the block size as defined when the target table was created. If the table is Fallback protected, then the Fallback will be loaded after the Primary table has finished loading. This enables the Primary table to become accessible as soon as possible.

A Sample FastLoad Script

The following page shows you an example of a FastLoad script. This script is designed to INSERT into an empty Teradata table called Employee_Table. This table exists in the database SQL01. FastLoad will first logon. Then it will build the table structure (unless it already exists and is empty). Then it will beginloading, but it will always define two error tables. A checkpoint is optional. Then the INSERT is performed and we are done.

How to write Fast Load Script?

Section 1

In this section we give the LOGIN credentials which is required to connect to TD system.Sessions command is used to restrict the number of sessions Fastload will make to connect to TD. Default is one session per AMP.

.SESSIONS 4;

.LOGON 127.0.0.1/tduser,tduser;

Section 2

In this section we are defining the table which we want to load from Fastload. DROP commands are optional. There is no need to define the structure of ERROR tables they’ll be created itself by Fastload.

drop table retail.emp_test;

drop table retail.emp_test_er1;

drop table retail.emp_test_er2;

create table retail.emp_test

(

emp_id integer not null,

emp_name varchar(50),

dept_id integer,

salary integer,

dob date format’yyyymmdd’

)

unique primary index(emp_id);

Section 3

In this section we give the BEGIN loading statement. As soon as Fastload receives this statement it starts PHASE 1.

BEGIN LOADING

retail.emp_test

ERRORFILES

retail.emp_test_er1, retail.emp_test_er2;

Section 4

RECORD command is used to skip the starting rows from the data file. RECORD THRU command is used to skip the last rows of data file. SET RECORD command is used to define the records layout and the “,” is the delimiter which we are using in our data file to separate columns.

.RECORD 1;

.RECORD THRU 3;

SET RECORD VARTEXT “,”;

Section 5

DEFINE statement is used to define the structure of the data file. This should be in accordance with the actual target table structure. Fastload DEFINE statement allows only VARCHAR format.

DEFINE

emp_id (VARCHAR(9))

emp_name (VARCHAR(50))

dept_id (VARCHAR(9))

salary (VARCHAR(9))

dob (VARHAR(50))

Section 6

FILE command defines the data file path and name.

FILE = C:\fload_data.txt;

Section 7

INSERT command is used to load the data file into actual target table. NOTE – For DATE columns we can use the data conversion by the syntax given below.

INSERT INTO retail.emp_test

(

:emp_id ,

:emp_name,

:dept_id ,

:salary,

:dob (format ‘yyyymmdd’)

);

Section 8

END LOADING ends PHASE 1 and starts the execution of PHASE 2. LOGOFF is required to close all the sessions created by Fastload.

END LOADING;

.LOGOFF;

How to run FastLoad Scrpts

fastload < Script Name >Fload.Log or

fastload < Script Name (It will display the log on the terminal)

Sample Script

.LOGON 127.0.0.1/tduser,tduser;

BEGIN LOADING tduser.EMP_FL

ERRORFILES TDUSER.EMP_FL_ER1,

TDUSER.EMP_FL_ER2

;

SET RECORD VARTEXT ",";

DEFINE

EmpNo (VARCHAR(15))

Name (VARCHAR(18))

Phone (VARCHAR(15))

DeptNo (VARCHAR(5))

Salary (VARCHAR(10))

YrsExp (VARCHAR(2))

FILE = C:\Users\Pundrik\Desktop\DATA\

EMP_MFI.txt;

insert into tduser.EMP_FL

(

EmpNo

,Name

,Phone

,DeptNo

,Salary

,YrsExp

)

values

(

:EmpNo

,:Name

,:Phone

,:DeptNo

,:Salary

,:YrsExp

)

;

end loading;

.logoff;

Fastload does not support the multiset table because of restart capability , Because once the fastload jobfail, Till the fastload failes some number of rows was sent to the AMPs. Now if you restart the FLOAD it would start loading record from the last checkpoint and some of the consecutive rows are sent for the second time.These will be caught as duplicate rows are found after sorting of data.

This restart logic is the reason that Fastload will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic. Fastload support Multiset table but does not support the duplicate rows. Multiset tables are tables that allow duplicate rows. When Fastload finds the duplicate rows it discards it. Fast Load can load data into multiset table but will not load the duplicate rows.

Can I use “drop” statement in the utility “fload”?

YES,But you have to declare it out of the FLOAD Block it means it should not come between .begin loading,.end loading FLOAD also supports DELETE,CREATE,DROP statements which we have to declare out of FLOAD blocking the FLOAD Block we can give only INSERT

1st Error table data

select * from TDUSER.EMP_FL_ER1

duplicate record information

2nd Error table data

select * from TDUSER.EMP_FL_ER2

Syntactical error in script
Record parcel length does not match
If the defining are using class structure length or input file, Data file not match, then we get the error
Input number of values not matching with column names

Restarting FastLoad Job

Condition 1: Abort in Phase 1 - data acquisition incomplete.

Solution: Resubmit the script. FastLoad will begin from record 1 or the first record past the last checkpoint.

Condition 2: Abort occurs in Phase 2 - data acquisition complete.

Solution: Submit only BEGIN and END LOADING statements; restarts Phase 2 only.

Condition 3: Normal end of Phase 1 (paused) - more data to acquire, thus there is no 'END LOADING' statement in script.

Solution: Resubmit the adjusted script with new data file name. FastLoad will be positioned to record 1 or the first record past the last checkpoint.

Condition 4: Normal end of Phase 1 (paused) - no more data to acquire, no 'END LOADING' statement was in the script.

Solution: Submit BEGIN and END LOADING statements; restarts Phase 2 only.

Teradata

Fast Load

No comments:

Post a Comment